66 research outputs found
Design and Implementation of MPICH2 over InfiniBand with RDMA Support
For several years, MPI has been the de facto standard for writing parallel
applications. One of the most popular MPI implementations is MPICH. Its
successor, MPICH2, features a completely new design that provides more
performance and flexibility. To ensure portability, it has a hierarchical
structure based on which porting can be done at different levels. In this
paper, we present our experiences designing and implementing MPICH2 over
InfiniBand. Because of its high performance and open standard, InfiniBand is
gaining popularity in the area of high-performance computing. Our study focuses
on optimizing the performance of MPI-1 functions in MPICH2. One of our
objectives is to exploit Remote Direct Memory Access (RDMA) in Infiniband to
achieve high performance. We have based our design on the RDMA Channel
interface provided by MPICH2, which encapsulates architecture-dependent
communication functionalities into a very small set of functions. Starting with
a basic design, we apply different optimizations and also propose a
zero-copy-based design. We characterize the impact of our optimizations and
designs using microbenchmarks. We have also performed an application-level
evaluation using the NAS Parallel Benchmarks. Our optimized MPICH2
implementation achieves 7.6 s latency and 857 MB/s bandwidth, which are
close to the raw performance of the underlying InfiniBand layer. Our study
shows that the RDMA Channel interface in MPICH2 provides a simple, yet
powerful, abstraction that enables implementations with high performance by
exploiting RDMA operations in InfiniBand. To the best of our knowledge, this is
the first high-performance design and implementation of MPICH2 on InfiniBand
using RDMA support.Comment: 12 pages, 17 figure
Origin and tuning of the magnetocaloric effect for the magnetic refrigerant MnFe(P1-xGex)
Neutron diffraction and magnetization measurements of the magneto refrigerant
Mn1+yFe1-yP1-xGex reveal that the ferromagnetic and paramagnetic phases
correspond to two very distinct crystal structures, with the magnetic entropy
change as a function of magnetic field or temperature being directly controlled
by the phase fraction of this first-order transition. By tuning the physical
properties of this system we have achieved a maximum magnetic entropy change
exceeding 74 J/Kg K for both increasing and decreasing field, more than twice
the value of the previous record.Comment: 6 Figures. One tabl
Marine hydrographic spatial-variability and its cause at the northern margin of the Amery Ice Shelf
Conductivity, temperature and depth(CTD) data collected along a zonal hydrographic section from the northern margin of the Amery Ice Shelf on 25–27 February 2008 by the 24th Chinese National Antarctic Research Expedition (CHINARE) cruise in the 2007/2008 austral summer are analyzed to study thermohaline structures. Analysis reveals warm subsurface water in a limited area around the east end of the northern margin, where the temperature, salinity and density have east-west gradients in the surface layer of the hydrographic section. The localization of the warm subsurface water and the causes of the CTD gradients in the surface layer are discussed. In addition, the results from these CTD data analyses are compared with those from the 22nd CHINARE cruise in the 2005/2006 austral summer. This comparison revealed that the thermoclines and haloclines had deepened and their strengths weakened in the 2007/2008 austral summer. The difference between the two data sets and the cause for it can be reasonably explained and attributed to the change in ocean-ice-atmosphere interactions at the northern margin of the Amery Ice Shelf
Building multirail infiniband clusters: Mpi-level design and performance evaluation
InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. However, even with InfiniBand, network bandwidth can become the performance bottleneck for some of today’s most demanding applications. In this paper, we study the problem of overcoming the bandwidth bottleneck by using multirail networks. We present different ways (multiple HCAs, multiple ports and virtual multirail configuration) of setting up multirail networks with InfiniBand and propose a unified MPI design that can support these approaches. We discuss various important design issues (out of order message handling, handling multiple HCAs) and provide an in-depth discussion of different policies for using multirail networks. We also propose an adaptive striping scheme that can dynamically change the striping parameters based on current system conditions. We implement our design and evaluate it with microbenchmarks and applications. Our performance results show that multirail networks can significantly improve MPI communication performance. With a two rail InfiniBand cluster, we can achieve almost twice the bandwidth and half the latency for large messages compared to the original MPI implementation. The multirail MPI implementation can significantly reduce the communication time as well as the total execution time depending on the communication pattern at the application level. We also show that the adaptive striping scheme can achieve excellent performance without apriori knowledge of individual rail bandwidth
High Performance RDMA-Based MPI Implementation over
Although InfiniBand Architecture is relatively new in the high performance computing area, it o#ers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) operations. In this paper, we propose a new design of MPI over InfiniBand which brings the benefit of RDMA to not only large messages, but also small and control messages. We also achieve better scalability by exploiting application communication pattern and combining send/receive operations with RDMA operations. Our RDMA-based MPI implementation currently delivers a latency of 6.8 microseconds for small messages and a peak bandwidth of 871 Million Bytes (831 Mega Bytes) per second. Performance evaluation at the MPI level shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22%. For large messages, we improve performance by reducing the time for transferring control messages. We have also shown that our new design is beneficial to MPI collective communication and NAS Parallel Benchmarks
- …